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lI INTRODUCTION 


Everyone wants to Know how big a sample is needed. In many forms of 
weapon system testing, there is always a decision about the sample size, and 
this decision is very important because an unnecessarily large sample takes 
extra time and increases costs. If the purpose of the testing is to estimate a 
value, then the test needs to give a good estimate (represented by a small 
confidence interval). At the same time it is desired to use the smallest sample 
size required for the desired accuracy. The topic of this thesis is to develop 
a way to find sample sizes when the testing is to estimate a correlation 
coefficient. 

There are many ways to find a sample size. In this thesis, the desired 
confidence interval size will be used as the basis for finding sample size. It is 
important to note that the size of the confidence interval depends upon the 
number of observations which are taken, and in general, if a bigger sample 
size. is used, then the confidence interval will be smaller. 

The problem of finding the sample size for estimates of proportions, given 
a desired confidence interval size, has been studied for a variety of cases [Ref. 
i]. [Ref. 2] and [Ref. 3]. The work reported here looks at sampling done to 
estimate a correlation coefficient. and the sample size that is needed to 
produce a desired confidence interval for that correlation coefficient. This 
work investigates and gives some opinion about the necessary sample size 
that would be used when estimation involves Pearson’s R, and also discusses 
the sample size problem when nonparametric statistical methods are 
employed. For each of these measures the relationship between sample size 
and confidence interval size will be analyzed, so that graphs and tables can 
be provided to assist a decision maker in finding the necessary sample size 
to obtain a desired confidence interval to estimate a correlation coefficient 


value. 


In Chapter Il a description of the classical sample measure of correlation 
(Pearson’s R) and the confidence intervals that can be developed using the 
normal approximation method will be provided. The third chapter addresses 
sample size determination for estimating a correlation coefficient using 
confidence intervals. This chapter will discuss how computer programs were 
developed and used, and graphs and tables were constructed to determine the 
required sample size to obtain a desired 95% confidence interval for a 
correlation coefficient. A comparison of methods is done to give easy to use 
results about sample sizes for estimating correlation coefficient values. Then, 
in Chapter IV, the use of Spearman and Kendall test statistics, and the problem 
of finding the sample size that is needed to produce a confidence interval of 
desired size will be described. Also, in this chapter a comparison will be done 
on the sample size results that are needed for a desired confidence interval 
size, using Pearson’s R, Spearman's r and Kendall's tau. 

The final chapter will summarize this research. and provide some 


suggestions for further research and study. 


Il. CORRELATION AND THE PEARSON PRODUCT-MOMENT CORRELATION 
COEFFICIENT 


In this chapter an explanation will be given on how to use the classical 
correlation coefficient method for a desired confidence interval. First, the 
Pearson product-moment correlation coefficient will be studied. Then this 
information will be used to show how estimates of the population correlation 
coefficient may be obtained. In the final part of this chapter. different 
procedures will be reviewed to find a confidence interval for population 


correlation coefficient by using the normal approximation method. 


A. THE PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT 

Before determining any sample sizes, a brief introduction about the 
Pearson product-moment correlation coefficient will be provided. Gibbon 
states: “In general, if X and Y are two random variables with a bivariate 
probability distribution, their covariance, in a certain sense, reflects the 
direction and amount of correlation or correspondence between the variabies. 
The covariance ts large and positive if there is a high probability that large 
(smal!) values of X are associated with large (small) values of Y. On the other 
hand, tf the correspondence is inverse so that large (small) values of X 
generally occur in conjunction with small (large) values of Y, their covariance 
is large and negative. This comparative type of correlation is referred to as 
concordance or agreement. The covariance parameter as a measure of 
correlation is difficult to interpret because its value depends on the orders of 
magnitude and units of the random variables concerned. A nonabsolute or 
relative measure of correlation circumvents this difficulty.” [Ref. 4: p.206] 


The Pearson product-moment correlation coefficient. defined as 
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[Ref. 4: p.206] is variant under changes of scale and location in X and Y. and 
in classical statistics this parameter is usually employed as the measure of 
correlation in a bivariate distribution. The absolute value of the correlation 
coefficient does not exceed 1. and its sign is determined by the sign of the 
covariance. If X and Y are independent random variables. then their 
correlation should be zero, but the converse is not true in general. “If the main 
justification for the use of p as a measure of association is that the bivariate 
normal is such an important distribution in classical statistics and zero 
correlation is equivalent to independence for that particular population, this 
reasoning has little significance in nonparametric statistics.” [Ref. 4: p.206] 
First of all. a measure of correlation between X and Y must satisfy the 
following requirements in order to be a good relative measure of association: 
© The measure of correlation value should be between -1 and +1; 


e lf the larger values of X tend to be paired with the larger values of Y, and 
the smaller values of X tend to be paired with the smaller values of Y. then 
the measure of correlation should be positive, and if the téndeiigvae 
strong then it Is close to +1; 


e lf the larger values of X tend to be paired with the smaller values of Y, and 
vice vers >, then the measure of correlation should be negative and if the 
tendency is strong then it is close to -1: 


e If the values of X and the values of Y are randomly paired, then the 
measure of correlation should be fairly close to zero. It means that X and 
Y ane independence 
B. ESTIMATION OF THE POPULATION CORRELATION COEFFICIENT 
Most of the time, the value of the population correlation coefficient (/’) is 
unknown, but it must be estimated from our sample. The sample correlation 
coefficient is a random variable which is used in situations where the data 
consist of pairs of numbers. A bivariate random sample of size n is 
represemied by (x), ), (Xone ae 
Suppose a random sample of n pairs (X,, Y,),(X>. Y2). ....(X,. Y,) is drawn 
from. a bivariate population with Pearson product-moment correlation 
coefficient p. Then, in classical statistics, the estimate used for p is the sample 


correlation coefficient R. defined as 
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[Ref. 5: p.244] where X and Y are the sample means 
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lf the numerator and denominator in Equation 2.2 are divided by n. then R 


becomes 


—————————— (2.5) 
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and it can be seen in Equation 2.5 that the numerator is the sample covariance 
and the denominator is the product of the two sample standard deviations (S). 
It means that this equation is similar in form to the population correlation 
coefficient defined in Equation 2.2. 

This sample measure of correlation may be used on a set of data without 


any requirements. but it is difficult to interpret unless the scale of 


tsyy 


measurements is at least interval. The important point is that R is a random 
variable with a distribution function, and the distribution function of R depends 


on the bivariate distribution function of (X,Y). 


C. CONFIDENCE INTERVALS FOR THE CORRELATION COEFFICIENT. 

lf it is desired to determine confidence intervals for p (population 
correlation coefficient), then the sampling distribution for the correlation 
coefficient R must be known. If (X.Y) is bivariate normal, then the expected 


value and variance of R are approximately 
E(R) = p, (2.6) 
and 
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provided nis not too small (2.7) 


[Ref. 6: p.462]. There already exist confidence intervals for confidence 
coefficients of 95 percent. These were determined by F. N. Davidvaiiame 
reproduced in Figure 95 on page 49 in Appendix A. In this figure, the abscissa 
is the estimated correlation coefficient from the sample data. For each given 
Sample size and value of R there is a confidence interval for p, varying as R 
goes from -1.0 to +1.0. For example, for R = 0.60. n = 5 the 95 percent 
COMM Cie e tihileley alisce @ Wiles Oras me ale 

lf a figure similar to that of Appendix A does not exist. or if we want to find 
the exact number for interval. the normal distribution can be used to obtain 
an approximation. 


The statistic commonly used is 
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which is distributed approximately normal with an expected value 
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[Ref. 6: p.463]. Note here that Z is not the standard normal variable. Using 
this transformation, the confidence interval for p can be calculated. Having 


calculated the estimate for p, namely R, we compute Z and the statistic 
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where K, approximately follows a standard normal distribution. 








Using the normal approximation. there will be 95° > certainly that 
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and the 95° confidence interval of E(Z) will be 














2 oe a? 
1 ie eet) 
c= + 1.966(Z 
<tin( ae} 1.966(Z) 
prom 2.10 
Pa 1+.R yea. 
/o{ — 3 — 1.966(Z)|}* < {| ——— | 
xP 29 in( —) 1.86012) }; < | 7, } (2.15a) 
and 


\ 


( | o )< exp) 2( in( +F) n 1 960(2))). (2.15b) 





/ 


lf the left side of 2.15a is L, and the right side of 2.15b is U,. then 
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Values for L, and U, can be computed from sample results. From 2.16 the 95% 


confidence interval for p will be 
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For example, if the data has 10 observations and the sample correlation 
coefficient R = 0.60, the 95% confidence interval can be estimated. Using the 
confidence belts in Figure 5 on page 49 in Appendix A, the bounds are 0.05 
and 0.89. These resulfs are rough. Using Equations 2.9, 2.10, and 2.11, we 


have 
Bee) lee 
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The 95 percent confidence limits for E(Z) are then 


UO932 = 1.90500 346-45 ( 2) = esse aloo. Uren 


which reduce to 


—0.047768 < E(Z) < 1.4341. 


The inequalities can be written as 
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and combining Equation 2.15a and 2.15b to obtain 
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| 1 
expi2 x ( —0.047768)} < ( < expi2 x (1.4341)} 


results in L, = 0.90905, and U, = 17.85. Thus from Equation 2.17 the 95% 


confidence interval for p ts 
0.90905 — 1 17.85 — 1 
( 0.90905 + 1 )<p<( 17.85 +1 ) 
which reduces to 


— 0.048 < p< 0.8925 . 


Confidence interval size increases as the sample correlation coefficient R 
approaches zero, and the largest confidence interval that could result will 


occur when R = QO. Here 
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so that the largest confidence interval size is 
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Results for this case are shown in Table 1. The table provides largest possible 
confidence interval sizes that could result for various sample sizes. For 
example. if a 95% confidence interval for p is desired which is no greater than 


0.2, then a minimum sample of size 36/7 would guarantee that result. 


Table 1. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.00 
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This chapter explained how confidence intervals for the correlation 
coefficient may be obtained. The next chapter will present methods to 
determine the needed sample size for estimating a correlation coefficient by 


using confidence intervals. 


Hi, SAMPLE SIZE FOR ESTIMATING A CORRELATION COEFFICIENT USING 
CONFIDENCE INTERVALS 


In Chapter || a discussion of the Pearson product-moment correlation 
coefficient. estimation of the population correlation coefficient, and confidence 
interval for population correlation coefficient was conducted. This chapter will 
study sample size determination for estimating the correlation coefficient, 
using confidence intervals and the normal approximation method that were 
explained in Chapter Il. Then, we will discuss how we can develop and use 
computer programs, graphs and the tables to determine the required sample 
size to obtain a desired 95% confidence interval for a sample correlation 
coefficient value. The final part of this chapter will show the required sample 


sizes for different sample correlation coefficient values. 


A. SAMPLE SIZE DETERMINATION USING THE NORMAL APPROXIMATION 
METHOD FOR THE ESTIMATED CORRELATION COEFFICIENT VALUE 
Suppose a confidence interval of size 2A is desired for the correlation 


eeenmierenit, 1hen, from Equation 2.17. 
2A = Upper Confidence Limit — Lower Confidence Limit 
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where 
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and 
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and from Equation 2.11 
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Thus, 95% confidence interval size (2A) will be equal to 
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lf Equation 3.5 could be solved for nin terms of 2A. there would exist a closed 
expression by which the needed sample size could be computed. However. it 
is very hard or impossible to solve Equation 3.5 for n in terms of 2A, because 
of the complexity. Although a closed expression for n could not be obtained 
a table can still be constructed using n as the independent variable. and 
solving for 2A. Such a table could then be used to estimate the needed sampie 
size. given a vaiue for 2A. 

However, a major difficulty still remains. From the form of U, and L, in 
Equation 3.2 and 3.3, it is seen that in subtracting the lower confidence limit 
from the upper confidence limit to obtain 2A, the sample result R does not 
vanish. Therefore. looking at Equation 3.5. to determine the required sample 
size, an estimate of the sample correlation coefficient value must be done. 

It is a curious result to see that in order to determine the sample size 
needed to estimate a correlation coefficient p by a value R. a first guess must 
be made at the result R of a sample not yet taken. However, in many cases 
some advance knowledge about R will be Known. e.g.. whether it is positive 


or negative, or whether it is greater or less than 0.5. It may not be likely that 


R is to be very high. (say, R < 0.8). In any event, the tables will show that n 
Pamo.extremely sensitive to the guessed vaiue of R. 

For example, suppose that R = 0.975 is estimated by the decision maker, 
and also suppose that the decision maker desires the confidence interval size 
ieeoes 10. Then thescecision maker can find n = 10 from Table 2. It Is 
important to note that, when R = -0.975, the confidence interval size is the 
same with R = 0.975, but as can be seen from Table 2 and Table 3 on page 
14, they have different upper and lower bounds because of the sign. For 
example. the values of the lower and the upper bounds will be 0.89 and 0.99 
for R = 0.975 and -0.99 and -0.89 for R = -0.975. Because of this symmetry 
negative sample correlation coefficient values will not be discussed for the rest 
of the study. Also, since our purpose is finding sample size. there is no 
interest in the upper and the lower bound. but only the confidence interval 


SIZe. 


Table 2. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.975 
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Table 3. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R= -0o7 5 
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As a second example. suppose the estimate of the sample correlation 
coefficient is 0.90 and we calculate the 95%. confidence interval using the 
normal approximation method. For a given sample size, this will yield the 
confidence limits. Table 4 on page 16 shows the number of samples required 
to obiain different 95% confidence interval sizes for various values of 2A when 
R = 0.90. 

An APL program, named “Tez” was written to obtain the sample size, the 
upper and the lower confidence limits, and the confidence interval size (2A) 
after inputting any estimated value for sample correlation coefficient. For R 
= 0.90. Table 4 on page 16 was constructed by executing this APL program, 
and the program was used to create similar tables in this chapter and in 
Appendix B. Table 5 on page 1/7. Table 6 on page 18 and Table 7 on page 


19 show the required sample size for 95% confidence intervals using R = 


0.80.R = 0./5 and R = 0.10. Tables for the other values of R are in Appendix 
c. 

ii iS tmportant to note that when R = +1, the Z statistic in Equation 2.9 
goes to infinity. Because of this the sample size cannot be calculated for the 
desired confidence interval when R=+ 1. However, an quess of R= +1 will 


not used. 


Table 4, REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.90 
95% 
Confidence 
Interval Size 


Estimated 
Correlation 
Coefficient 
Value 
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Table 5, REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.80 
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Table 6. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.75 
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Table 7. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.10 
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some graphs are provided which show the difference between sample 
sizes for different quesses of the sample correlation coefficient value. These 
graphs can also be used to determine the appropriate sample size for a 
desired confidence interval. Figure 1 on page 21 shows the sample size and 
confidence interval for R = 0.95 and R = 0.90. From this figure it is obvious 
that sample size increases as R decreases. Also there is a a high sensitivity 
in sample size to the guess of R. However, when our guess of correlation is 
Smaller, (say, R less than 0.6) then n will not be as sensitive. In reality, we 


might assume R is not likely to be very high. (say, R < 0.8). 
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Figure 1. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.95 AND R = 0.90 


Figure 2 on page 22 shows the sample size and the confidence interval for 
R = 0.65 and R = 0.45, and Figure 3 on page 23 shows the sample size and 
the confidence interval size for R = 0.55 and R = 0.35. From these two 
figures we can find the required sample sizes approximately, and we see that 
nis not too sensitive to the guessed value of R. The graphs of different sample 
correlation coefficient values are helpful in presenting the sensitivity 
differences in the sample size. Other graphs with different sample correlation 


coefficient values are in Appendix D. 
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Figure 2, REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.65 AND R = 0.45 
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Figure 3. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.55 AND R = 0.35 


B. COMPARISON OF SAMPLE SIZES FOR DIFFERENT CORRELATION 
COEFFICIENT VALUES 

A comparison of the results for different correlation coefficient values 
shows that as the correlation coefficient gets larger in absolute value, then the 
required sample size gets smaller for a desired confidence interval size. 
Table 8 on page 25 shows the results obtained from the computer program for 
different combinations of sample correlation coefficient estimates and 
confidence interval sizes. For example, if a confidence interval size 2A of 0.15 
is desired then the required sample size is 20 for R = 0.925, 30 for R = 0.90, 
171 for R = 0.70, 363 for R = 0.5 and 641 for R = 0.0. Further, a confidence 


ty 
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interval of size 0.2 and a prior estimate of R = 0.6 could reduce the sample 
size to n = 153, which is less than half the n = 367 observations that would 
be required under total uncertainty about R (estimate R = 0.0). 

How can this table be used when the sample correlation is not yet known?. 
The table can provide general guidance to relieve some of the mystery in 
choosing the size of asample. For example, if a maximum confidence interval 
of size 0.2 is desired and the variables are assumed to be highly correlated, 
a sample of 50 should work; while if the correlation is assumed small, then 


several hundred observations will be needed. 


Table 8 REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL 
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This chapter has discussed the problem of determining a sample size to 
estimate a correlation coefficient for a desired confidence interval, when 
Pearson’s R is used. The next chapter will present two nonparametric 
measures of correlation (Spearman’s and Kendall's test statistics) and explore 


the problem of finding a sample size for these cases. 


IV. NONPARAMETRIC MEASURES OF CORRELATION, AND SAMPLE SIZE 


in the late 1930's a different approach to the problem of finding 
probabilities began to gather some momentum. This new package of statistical 
procedures became known as “nonparametric Statistics,” and the methods 
often involve less computational work, and therefore are often easier and 
quicker to apply than other statistical methods. [Ref. 5: p.3] 

Included among methods described as “nonparametric” are procedures 
providing a measure of correlation when the bivariate data (X, Y) are on strict 


ordinal scales. An example of such data for sample size five could be 


x Y 
Z 4 
5 2 
5 3 
| 1 
4 aie 


where X,< X, would imply that X, possesses more of the property being 
measured than X,, and Y,< Y, would imply that Y, possesses more of the 
property than Y,. Ordinal data can arise directly from the measuring 
procedure in the experiment, or can be obtained from interval or ratio scaled 
data. An example of the latter would be bivariate data of the temperatures in 


ceniigrade in Istanbul and Izmir for five days: 


x Y 
PLS. ol 
25 Ze 
30 24 
ea 20 
of 3. 


When reduced from an interval to an ordinal scale, this data would be as in the 


example shown above. 


In the previous chapter we discussed sample size determination for 
estimating a correlation coefficient using confidence intervals and compared 
the sample sizes for different sample correlation coefficient values. In general, 
the sampling distributions of R depends upon the form of the bivariate 
population from which the sample of pairs is drawn. More importantly, 
Pearson’s R as a correlation measure requires that data be on an interval or 
ratio scale. 

Here we will discuss the Spearman and Kendall measures of correlation. 
First, some of the theory and examples of Spearman’s measure of correlation 
will be provided. Then a discussion will be conducted in the use of the normal 
approximation method with Spearman’s r, and how confidence intervals can 
be constructed. The next part of this chapter will summarize the theory and 
give examples of Kendall’s measure of correlation. Likewise. the use of the 
normal approximation method to find confidence intervals with Kendall's 7, 
will be presented. The final part of this chapter. will look at the results that can 
be obtained from Pearson's R. Spearmans r, and Kendall's t,. and compare 


the sample sizes obtained from these three methods. 


A. SPEARMAN’S R 

For this thesis, we let “r” be the notation for Spearman’s coefficient of rank 
correlation. It is usually designated by p but. the use of p will cause some 
confusion between population correlation coefficient and this rho. 

In general. the sampling distribution of R depends upon the bivariate 
population from which the sample of pairs is drawn. But, suppose that the X 
observations are ranked from smallest to largest using the integers 1.2.3....,n. 
and the Y observations are ranked the same way. In other words, each 
observation is assigned a rank according to its magnitude relative to the 
others in its own group. Then, the data consists of n sets of paired ranks, and 
using these pairs. R as defined in Equation 2.5 can be calculated. The 
resulting statistic is called Spearman’s coefficient of rank correlation (r). The 
difference between Pearson’s R and Spearman’s r is. Spearman’s r measures 


the degree of correspondence between rankings. instead of actual variate 


values. However, it can stil! be considered a measure of correlation between 


X and Y in the continuous bivariate population. Let 


FR(X;) = rank(X,), 


and 


R(Y,) =rank(y,). 


Spearman's coefficient of rank correlation is 


% 
12 LAG) core sda) ata) 
— 


po te ——_—___—__ (4.1) 
i 1) 


and if the data are replaced by their ranks, then X and Ycorresponds to R(X) 


and R(Y). and can be calculated as 


i n 
=a jaa (4.2) 
fd nesta te) Hae 
and in the same way 
TNs ee (4.3) 








12) [R(X) - L -— |[R@~- i = | 


i (4.4) 


(Ref. 5: p.246}. An equivalent but computationaly easier form is given by 


and if we take T= 5[R(X,) — R(Y,)? then, Equation 4.5 will be, 


ok 


r=1—-——-—__.. 
nn ey) 


(4.5) 


(4.6) 


It is important to note that Equation 4.5 and 4.6 are equivalent to 4.4 only if 


there are no ties. 


If a small number of ties are present in the data, Equation 4.5 and 4.6 can 


be used because of the simplicity and there will be very little difference 


between the two coefficients obtained from 2.5 and 4.5. If there are many ties, 


then Pearson's R in Equation 2.5 should be used on the ranks as described 


below. In this manner, X corresponds to R(X) and R(Y) as explained before. 


and, Sig — X)? and Sey — Y)? corresponds to 
=] 


(= 


G —i(n +1) + ama 


In the same way 


30 


R( aye = 2 4.8 
[R(Y;) — (ae (4.8) 


i 


Thus Equation 2.5 becomes Equation 4.4, and this means that Pearson’s R 
reduces to Spearman’s r when the data are reptaced by their ranks. 

The fotlowing is an example to see the difference between Pearson’s R and 
Spearman’s r. Let’s take 12 paired data like (86.88), (71,77), (77,76). (68.64), 
fees). (/2,/2). (77,65), (91,90), (70.65), (71,80), (88,81), (87,72). Suppose these 
are the math and the english scores of 12 students. The math scores of the 


students were ranked among themselves. 
Ae O68 (OC 1 (ical 77 06 87 6a.gr Gill), 
and the english scores of the students were ranked among themselves. 


Y, = 6469 65 72 72 76 77 80 81 88 90 96. 


There are 3 pairs of ties in X variables and 2 pairs of tie in Y variable. The 
pairs of ties will be given the average ranks for each pair. For example the 


: tes. ane 





first ties are when the X variable is 71; thus the rank will be 


other pairs of ties were similarly ranked and the general result is, 


POO, Soe, 1 Wilks OOD amt oyee Mond. Cao 


and 


Og NO Oi 1, Neo 2 ad eee O.0 S).. 4.8 
By using these values we can calculate 


[R(x,) —R(Y)]° =4, 12.25. 0.25, 0, 0.25. 0.25. 16. 0.25, 0.25, 20.25. 1, 20.25 


and then. calculate the statistic T in Equation 4.6 as 


12 
re > RX —R(Y)} = 75. 
eS) 


Then r is obtained from Equation 4.6 as 


6(75 
a. one eee ee 
= |) 12(143) 


Using Equation 4.4 to calculate the r value, results in r = 0.729, and using 
Equations 2.13. 2.14. to calculate the Pearson’s R on the ranks gives R = 
0.7354. As can be seen. there is a very small differences between these 


values. 


B. CONFIDENCE INTERVALS FOR CORRELATION COEFFICIENT WHEN WE 
USE SPEARMAN’S R 

lf X and Y are independent and continuous then the population correlation 
coefficient will be equal to zero. and if this happens then the expected value 
of the sample correlation coefficient will essentially be zero too, because 
E[R|=p. The variance of the sample correlation coefficient will be equal to 
+ and from Equation 2.7 it is very clear that as a sample size gets bigger then 
variance of the sample correlation coefficient will approach zero. 

To find a confidence interval for the population correlation coefficient by 


using Spearman’s r. the statistic will be 





me a nad Sa ie 
i 5 n( ae ) = tanh r, (4.9) 


which is distributed approximately normally with expected value 


1 
Fin e ) (4.10) 


lle 





E(Z) — 


and variance 


Balios (4.11) 


tet. 6: 9.463). 
Using this transformation, the confidence interval for p can be found. 
Having calculated the estimate for p. namely r, we can compute Z and the 


statistic 


7 1 (1+p\) — =. 2-2) 
K,= |Z ; n( = )he=3 oF (4.12) 


which is approximately normally distributed with expected value equal to 0.0 
and variance equal to 1. 
Using the normal approximation, there is 95% certainty that 
Z—E(Z) . 
—1,96 < ——_-—-- < 1.96. (4.13) 


G 


and the 95% confidence interval of E(Z) will be 








7 Te, 
se ) = 1.960 < €(Z)=4 in| > 





1-—r 2 1-— op 
I (4.14) 
<> In( —)}° 1.96c 
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Equation 4.10 may be used to obtain 


f 


exp)2{ in - 1.960 | Z ( pe: } (4.15a) 








[ eC 1 ; 7 
and 
( lesries < exp!a( 1 in( 124) +1960 { (4.15b) 
ay [ 2 1—r | yp | 


If the left side of 4.15a is L, and the right side of 4.15b is U, then 


{2 
ow) 





/ 1 cis p \ 
ba<( => ) < U2 (4.16) 


and from this equation the 95% confidence interval for p will be 


ge U, — 1 ‘g 
ey Aes ag 


Spearman's r can be used to find a confidence interval for a population 
correlation coefficient, by using the normal approximation method. It is very 
important to note that when using this approximation the observations (X, Y) 
are independent. If these bivariate observations are independent then the 
measures of correlation values (Pearson’s R and Spearman’s r) will almost be 
equal. Thus. both of these methods can be used to find a confidence interval. 
lf the observations are not independent then Spearman’s r cannot be used in 
place of Pearson’s R. Again. the largest sample size for a desired interval size 


that could occur will occur when r = 0, and we call this the worst case. 


C. KENDALL’S TAU 

Another measure of correlation is Kendall's (t,). which is usually 
considered more difficult to obtain than Spearmans r. The basic advantage 
of Kendall’s t, is that its distribution approaches the normal distribution quite 
rapidly. so that the normal approximation ts better for Kendall's tz, than it is for 
Spearman’s r. Another advantage of the Kendall test statistic is its direct and 
simple interpretation in terms of probabilities of observing concordant and 
discordant pairs. [Ref. 5: p.356] 

For any two independent pairs of random variables (X,, Y,) and (X.. Y,). we 
denote by p, and p, the probabilities of concordance and discordance. Two 
observations, for example (2.3, 3.5) and (2.6, 1.7), are called concordant if both 
members of one observations are larger than their respective members of the 
other observation, and are called discordant otherwise. The probabilities p, 


and p, cam De Gdelned as 


34 


Pe = PILIX) < X) NY; < YIU LOG > XH) A (> YI} 
= P[(X; — X)(¥; — ¥;) > 0] (4.18) 
= PUX;< X) NY; < YI + PLX > Xx) NY > Yp. 


and 


Pg = PIX; — XY; — ¥) < 0] 


= PLUX)< X) MY > ¥)] + PLOG > YIN < YP) oe 


[Ref. 4: p.208]. 
lf there is a perfect correlation between X and Y, then there is either perfect 
concordance or perfect discordance. The Kendall coefficient t is defined as 


the difference 
T= Pe ee Pq. (4.20) 


lf the marginal probability distributions of X and Y are continuous, so that the 


possibility of tiles X,= X, or Y, = Y, within groups is eliminated, we have 


feet ee OY) eet, XG al aya 


4 {P(Y, > ¥) — PLIX) < HIN > YT el) 
Dus. 
Daa eG G) Tie yea wor 
= aie. 
In this case, t can be expressed as 
t=2p,.—1=1-—2py (4.22) 


[Ref. 4: p.208]. 
lf X and Y are independent and continuous random variable then p. must 


be equal to p,. and so we find t = QO. This means that for independent and 


continuous random variables t, will be equal to zero. In general, the converse 
is not irue. (Rel4aap 205) 

All this explanation ts about the population. However we are interested in 
the sample. If there are n observations then it means these n observations 
may be paired in (- -—— different ways. Suppose we conipare all 
pairs and determine the number of concordant pairs and the number of 
discordant pairs. Let ¢ be the number ol concordanmaoainc. TG iigeeea 


unbiased estimate of p. will be 


n 


=) 2G; (4.23) 
ee) eee a? 


— 


Now let d¢ be the number of discordant pairs and then 


2d. 


Pop 1) 
j= 


be = (4.24) 


will give an unbiased estimate of p,. A measure of correlation of the sample 


will be 


Ts = (Do — Pg) (4.25) 

(Ref. 4: p.210]. This is Kendall’s sample fau coefficient t,, which is an unbiased 

estimater of the parameter t in any bivariate distribution. “It ts important to 

note that the variance of t, approaches zero as the sample size approaches 
infinity.” [Ref. 4: p.211] 

Using the same data that we used in the Spearman example to calculate 

r, Kendall’s t, will be calculated. Arrangement of the data (X,, Y,) according to 

increasing values of X gives these pairs of observation: (68, 64). (70, 65), (71, 

77). (71, SOVP (fee fay. (7%, GS), (F776), (667 86) (S77 2)185 61), (Steere oe 


) [here are ties in Sseeres 71) 77eand 91) Were culate 
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C= (ie Sie tae eno 4 eo ec ,- 0; 0 
and 


SD), Meets Te, (Oy Oh, 0) 


249 


and by using Equation 4.23 and 4.24 we find that p.= 12(11) 


ae ee 12 
meet ii) 





From Equation 4.25, 
t. = (Db. — Py) = (0.7424 — 0.1818) = 0.5606 


estimates a positive correlation between these variables. We already found r 
= 0.7378 with the same data. In general, the absolute value of Spearman's r 
will tend to be larger than the absolute value of Kendall’s tau. As a4 test of 
significance there is no strong reason to prefer one over the other, because 


both usually give almost the same result. [Ref. 5: p.251] 


D. CONFIDENCE INTERVALS FOR CORRELATION COEFFICIENT WHEN WE 
USED KENDALL’S TAU 
To find a confidence interval for the population correlation coefficient by 


using Kendall's c,, the Z statistic will be 


f 


1 Nc ao oe - | 
LS | Fae | «(Lelie (4.26) 
2 Laas or 


which is approximately normally distributed with the expected value given in 
Equation 4.10 and variance given in Equation 4.11. [Ref. 6: p.463] 


Again normalization on Z can be accomplished yielding 


1 f ttp\) — =. 2- £2) 
Ky= |Z ; nf — )he= 5; (4.27) 


which is approximately normally distributed with expected value equal to 0 





and variance equal to 1. 


Using the normal approximation, there is 95°. certainty that 


oy 


196 2 eee (4.28 
| c(Z) ae a) 


and an 95% confidence interval of E(Z) will be 


1 1+ T, 1 t+p° 
; n( a 1.966 < E(Z) => n( el 





(4.29) 





and 





_ 1962) < ( J sl ) (4.30a) 


( 1 Viera 
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a lecae ae of 4 ee Hl 
( F=4 ) < exe) 2f Fin FE ) + 1.968) (4.30b) 


lf the left side of 4.30a is called L, and the right side of 4.30b is called) Ossie 


ee 


and 











ee ee) iy 4.31 
and from this the 95% confidence interval for p will be 
al ee | ie 
caus ——-—— |, 
Le eh a) 


Thus Kendall's t, in place of Pearson's R can be used to find a confidence 





interval for the population correlation coefficient by using the normal 
approximation method explained above. Again, if Kendall's t, is used in place 
of Pearson’s R, the observations need to be independent. If they are not 
independent then the normal approximation method to find a sample size for 


desired confidence interval cannot be used. 
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E. SAMPLE SIZE DETERMINATION FOR THE NONPARAMETRIC MEASURES 

lf the nonparametric measures of correlations are to be used to obtain a 
confidence interval, the bivariate observations must be independent. If they 
are not independent then the nonparametric measures of correlations cannot 
be used. If the variables are not independent, then the population correlation 
coefficient value will be different than zero. If the population correlation value 
is different than zero. then the standard normal approximation can not be 
used. The only knowledge is that if X and Y come from independent bivariate 
observations, then use of the normal approximation method to find a 
confidence interval for population correlation coefficient is valid. For this 
purpose, Pearson's R is used for determinating sample size when using the 


normal approximation method that was explained in Chapter Il. 


F. THE RELATION BETWEEN PEARSON’S R SPEARMAN’S R AND KENDALL 
TAU 

lf the data are at least interval scaled with independent observations. then 
all three measures of correlation value can be used to find a maximum sample 
size for a desired confidence interval by using the normal approximation 
meunea, 10 see the difference between these three method. let s use the same 
12 sample data pair we used in Chapter IV, Section A. Previous computations 
from the data resulted in r = 0.729. R = 0.7354 and 1,=0.5606. If the 
confidence interval for population correlation coefficient is calculated by using 


R = 0.7374, the statistic will be 


1 ey ae 
ae 5 n( 0 D606 ) = o.9aae, 


and standard deviation 


A ar CTA 


Ai 28) 





The 95 percent confidence limits for E(Z) are 


oN 


0.9448 — 1.96 x 0.334 < E(Z) < 0.9448 + 1.96 x 0.334, 
which reduce to 
22902 <2) = eee oo: 
The inequalities can be written as 


(ee 
0.2902 <zin{ P 





2 


)< 1.5995, 
l= 


and from Equation 2.15a and 2.15b 


eae a 
exp{2 x (0.2902)! < ( 7a ; } Sex) eat oOo: 





Calculating C, = 1.7868, and U, = 24.508 and applying Equation 2.17, the 


confidence interval for p will be 


1.7868 — 1 / 24.508 — 1 
( 1.7868 +1 )<p<{ 24.508 + 1 ) 


which reduces to 


WI2823 = wea Z lee 


So. the 95% confidence interval for p by using Pearson’s R is 
0.2823 < p < 0.9216. and confidence interval size (2A) ts 0.6363. 


lf Spearmans ris used with r = 0.729, then 


Doe ee 
a= nee ) = 0.9266 


and a(Z) is the same as with Pearson’s R. The 95 percent confidence limits for 
E(Z) are 


0.9266 — 1.96.x 0.334 =.8\Z)i= 0 0266 eo bea o34 
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which reduce to 
OPA IP <A || 7h Roe 1a ye\ 18% 
The inequalities can be written as 


Lag eee 
0262 
272 wer: 





)< 1:36.13, 


and from Equation 4.15a and 4.15b 





| ae 
expid x (0.272); = ( bs exp eee 1.5813). 


1—p/p 


Parcuraiing L, = |./229 and Ul, = 23.632, the contidence interval for p is 


( eede )< 1<( poneee = 
epee eae is Naercs oe Ie 


yielding 
0.2655 < p < 0.9188. 


50, the 95% confidence interval for p by using Spearman's r is 
zon — =< 0.9188, and confidence interval size (2A) Is 0.6553. 


Finally using Kendall’s tau, t, = 0.5606. gives the statistic 


— ol 5606. \— 
—— 5 tee ) = 0.6397 


and o(Z) = 0.334. The 95 percent confidence limits for E(Z) are then 


0.6337 — 1.96 x 0.334 < E(Z) < 0.6337 + 1.96 x 0.334 


which reduces to 


—0.021 < E(Z) < 1.2884. 


The inequalities can be written as 


ai 





ee 
— 0.021 <-——In Y \<1.2884. 
2 1-p 


and from Equation 4.30a and 4.30b 





1) 55 | 
exp{2 x ( —0.021)} < ( i < exp{2 x (1.2884)}. 


Again calculating L, = 0.9589 and U,; = 13.155, the confidence interval for p 


will be 
0.9589 — 1 13.155 — 1 
( 0.9589 + 1 )<p<( 13.155 + 1 ) 
Or 


— 0.021 < p < 0.8587. 


Thus the 95% confidence interval for p using Kendalls tau ts 
— 0.021 < p < 0.8587 , and confidence interval size (2A) is 0.8797 

As can be seen from these three results, Pearson’s R and Spearman's r 
give approximately the same confidence interval size (2A). However. the 
confidence interval size that was obtained from Kendall's 7, is noticeably 
different from the others. This seems to be a disadvantage for Kendall’s tau, 
but Conover states that there is no strong reason to prefer one over another, 
because they will generally give roughly the same result. [Ref. 5: p.251] 

Graphs are provided to show the difference among sample sizes from 
these three methods. At the same time these graphs can be used to determine 
the appropriate sample size for a desired confidence interval. Figure 4 on 
page 43 shows the sample size and the confidence interval for these three 


methods. 
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Figure 4. REQIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL SIZE 
BY USING DIFFERENT SAMPLE CORRELATION METHODS 


A table can be developed to provide the exact value for different sample 


correlation coefficient methods. Table 9 shows these sample size values. 


43 


Table 9. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL SIZE 
BY USING DIFFERENT SAMPLE CORRELATION METHODS 


Interval Size = 2A 0.7354 = 0./29 0.5606 
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p oto 8 t 
es 

0.12 a a 


OES 183 190 
0.14 159 ie5 


rs ee 
a 
SY a OO 


This chapter explained Spearman’s and Kendall’s measures of correlation, 
and the problems of finding a sample size for a desired confidence interval 
size by using Spearman’s r or Kendall’s t, The next chapter will summarize 


this study and give some suggestions for further research and study. 


V. SUMMARY AND SUGGESTION FOR FURTHER RESEARCH AND STUDY 


In this chapter, a summary will be given of study of sample sizes for 
desired confidence intervals when the classical sample correlation coefficient 
method (Pearson’s R), and the nonparametric statistical sample correlation 
coefficient methods (Spearman’s r and Kendall’s t,) are used. Additionally, 
recommendations will be made for some additional study into the reduction 
of the number of observations needed to obtain a desired confidence interval 


for the correlation coefficient. 


A. SUMMARY 

This study described the classical sample correlation coefficient (Pearson 
R) and the nonparametric statistical sainple correlation coefficient methods 
(Spearman’s r and Kendall's t,) to obtain the number of samples needed to 
obtain a desired confidence interval size for a correlation coeiiicient: 

First. a description was provided of the Pearson product-moment 
correlation coefficient. the estimated population correlation coefficient. and 
confidence intervals for the population correlation coefficient by the using the 
normal approximation method. In the next chapter, it was shown how the 
sample size for estimating a correlation coefficient using the confidence 
interval could be obtained, and a comparison was done of these results for 
different sample correlation coefficient values. The result had the limitation 
that one must guess at the sample result before taking the sample, but it was 
still possible to give general results about the magnitude of needed sample 
sizes. In Chapter IV, the Spearman and Kendall statistical sample correlation 
coefficient methods were described. Analysis concluded showing that there 
is no way to find a sample size by using the Spearman and Kendall statistical 
sample correlation coefficient method when rho is not equal to zero, due to the 
absence of any information about the cumulative distribution function when the 
population correlation coefficient is nonzero. similarly, values for 


probabilities, expected values, and variances could not be determined. 
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However, most of the time the value of population correlation coefficient is 
unknown. If the observations are independent. then a sample size for a 
desired confidence interval using nonparametric measures of correlation can 
to found. lf the observations are not independent then the normal 
approximation method can not be used for nonparametric statistics to find a 
needed sample size. and instead Pearson’s R must be used to find a sample 
size. 

To use the normal approximation method, the decision maker must 
estimate the measure of correlation value, and then determine the desired 
sample size for a confidence interval of size (2A). In order to find a sample for 
the desired confidence interval, the sample correlation coefficient must first 
be estimated without any data. 

The results for different sample correlation coefficient values were 
compared. and it was observed that if the sample correlation coefficient value 
gets bigger in absolute value then the sample size gets smaller, and the 
largest sample size that could result will occur when R equals zero. 

Computer programs were developed to calculate sample sizes for a 
desired confidence interval for different sample correlation coefficient values. 
and some tables and graphs giving the sample size needed for different R 
values were generated. These tables and graphs can be used by a decision 


maker to assist in determining the desired sample size. 


B. SUGGESTIONS FOR FURTHER STUDY 

In this study. 95% confidence intervals were used. It would be useful if 
tables and graphs are developed for other confidence interval sizes, such as 
Bio, of. /o and 99°. 

The discussion about nonparametric statistics in this study centered on 
the: Spearman and Kendall test statistics. It was not concluded that these 
methods needed smaller sample sizes than the classical, Pearson’s method. 
Additional research could be done searching for appropriate sample sizes for 


other nonparametric statistics. 
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lt is sincerely hoped that the information about sample size needed to 
estimate correlation coefficients, and the tables, graphs and computer 
programs in this thesis be beneficial to decision makers in deciding the 
sample size for a desired confidence interval, when estimating a correlation 


coefficient. 
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APPENDIX A. TABLE FOR CONFIDENCE BELTS FOR THE CORRELATION 
COEFFICIENT 
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Figure 5. 95% CONFIDENCE BELTS FOR THE CORRELATION 
COEFFICIENT: The Vertical axis of this figure shows op, the 


Horizantal axis shows R. 


[Ref. 6: p.545]. 
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APPENDIX B. 


THE APL PROGRAM ’TEZ” USED TO COMPUTE CONFIDENCE 


INTERVAL FOR DESIRED SAMPLE CORRELATION COEFFICIENT VALUE 
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THIS PROGRAM COMPUTES THE CONFIDENCE INTERVAL WITH 
ESTIMATED SAMPLE CORRELATION COEFFICIENT VALUE FOR 
DIFFERENT SAMPLE SIZE. TO RUN THE PROGRAM, ENTER 
DESIRED CORRELATION COEFFICIENT. IT TERMINATES THE 
EXECUTION WHEN TERE SAMPLE SIZE IS > 200. FOR BIGGER 
NUMBERS, THE VALUE OF N IN LINE 29 MUST BE INCREASED 
TO DESIRED SAMPLE SIZE. IF CONFIDENCE LEVEL DIFFERENT 
THAN 95 PERCENT, THEN THE STANDARD PROBABILITY VALUE 
MUST BE CHANGED IN LINE 15 AND 16. IT IS IMPORTANT 
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APPENDIX C. TABLES FOR DESIRED SAMPLE SIZE USING DIFFERENT 
ESTIMATED SAMPLE CORRELATION COEFFICIENT VALUES AND A 95% 
CONFIDENCE LEVEL 


Table 10. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.95 
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Table 11. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.925 


Estimated 95% 


Correlation ei ee ce Confidence 
Coefficient | 3 Interval Size 
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Table 12. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 


R = 0.85 
Estimated 95%, 
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Table 13. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 


Estimated 95% 
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Correlation Confidence 
Coefficient Confidence Confidence Interval Size 


Value 


Limits Limits 





Table 14. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
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Table 15. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 


R = 0.60 
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Table 16. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
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Table 17. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.50 
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Table 18. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
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sample Lower Upper 
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Table 19. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
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Table 21. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.30 
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Table 22. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 


R = 0.25 
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sample Lower Upper 
Correlation nae = Confidence Confidence Confidence 
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Table 23. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.20 
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Table 24. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 


R = 0.15 
Estimated 95% 
Sample Lower Upper 
SON HIN —_ = Confidence Confidence Confidence 


Interval Size 
= 2A 


015 [946 [000 foe 
SX YX 
[015 | a0 | 00s [ova fone 
a 
[oss | zor | oe | oa? | 028 
[015 [10 | 000 [020 [| 020 
[ons | tsa [000 <a 


Coefficient 


Velie Limits Limits 





Table 25. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
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APPENDIX D. GRAPHS THAT CAN BE USED TO DETERMINE SAMPLE SIZES 
TO ESTIMATE CORRELATION COEFFICIENT VALUES 


Pc 
ve) 
N 
vn 
Lu 
= 
Oo. 
Z 
Vj 


0.1 G.2 
CONFIDENCE INTERVAL SIZE = 2A 





Figure 6. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.925 AND R = 0.85 
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Figure 7. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.75 AND R = 0.60 
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Figure 8. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.70 AND R = 0.50 
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Figure 9. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL WHEN 
R = 0.40 AND R = 0.20 
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Figure 10. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL 
WHEN R = 0.30 AND R = 0.10 
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Figure 11. 
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REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL 
WHEN R = 0.25 AND R = 0.05 
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Figure 12. REQUIRED SAMPLE SIZE FOR A 95% CONFIDENCE INTERVAL 
WHEN R = 0.15 AND R = 0.00 
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